Clustering at 74 MHz 
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In order to construct accurate point sources simulations at the frequencies relevant to 21 cm 
experiments, the angular correlation of radio sources must be taken into account. Using the 74 
MHz VLSS survey, we measured the angular 2-point correlation function, w(8). We obtain the 
first measurement of clustering at the low frequencies relevant to 21 cm tomography. We find that 
a single power law with shape w{9) = A9 ' fits well the data. For a galactic cut of \b\ > 10°, 
with a data cut of 8 > -10°, and a flux limit of S = 770 mjy, we obtain a slope of 7 = (-1.2± 
0.35). This value of 7 is consistent with that measured from other radio catalogues at the millimeter 
wavelengths. The amplitude of clustering has a length of 0.2°- 0.6°, and it is independent of the 
flux-density threshold. 



I. INTRODUCTION 

Progress in detector, space and computer technology 
has triggered an avalanche of high-quality cosmological 
data, removing cosmology from the realm of philosophy 
and transforming it into a quantitative empirical science. 
In the past few years, many authors have argued that the 
21cm tomography, i.e., the three-dimensional mapping 
of highly redshiftcd 21cm emission, will be the ultimate 
cosmological probe - see, e.g., 0, i, i, fl [S B S i, !, EE 
[ill ]. Although this signal has yet to be detected, there 
is a theoretical consensus that the 21 cm signal must be 
out there and would be extremely useful if measured. 

El, [13, [H, El, [H, da 



Although ambitious experimental efforts in 21cm to- 
mography are now in progress across the globe (see Ta- 
ble HI, h is widely known/understood that the cosmolog- 
ical results of these experiments will only be as good as 
our ability to deal with (or to remove) foreground con- 
tamination pj, [H |H El, S3, [H, H. The goal of this 
work is to support these worldwide experimental efforts 
by tackling the foreground issue. 

Understanding the physical origin of Galactic metre 
wavelength emission is interesting for two reasons: to 
determine the fundamental properties of the Galactic 
components, and to refine the modeling of foreground 
emission for cosmological 21 cm experiments. At me- 
tre wavelengths, the bulk of foreground contamination 
is due to synchrotron emission. When coming from ex- 
tragalactic objects, this radiation is usually referred to 
as point source contamination and affects mainly small 
angular scales. When coming from the Milky Way, this 
diffuse Galactic emission fluctuates mainly on large an- 
gular scales [2lj . 

Normal galaxies, radio galaxies and active galactic nu- 
clei form the majority of extragalactic continuum sources 
[25I ] . A number of surveys of radio sources have been per- 
formed at frequencies relevant to the 21 cm tomography 
- see Table [TTJ and analysis of these catalogs have helped 
to bring some understanding about their statistical prop- 
erties: the distribution of radio sources is found to obey 
Poisson statistics with very weak observed angular clus- 




FIG. 1: The 74 MHz VLSS catalogue (top), and one of our 74 
MHz mock catalogues (bottom). Both catalogues are plotted in 
the interval of (0 < S < 5)Jy, and in Galactic coordinates with the 
Galactic center at the origin and longitude increasing to the left. 
The mock catalogue also shows the final area used in our analysis 
(5 > -10° and |6| > 10°). 

tering - see Table IIII1 

Some aspects of both experimental design optimization 
and actual data analysis require full-blown simulations 
of the sky signal and knowledge about how it propagates 
through the instrument and the data analysis pipeline — 
this has motivated the ambitious simulation efforts car- 
ried out by, e.g., WMAP and Planck. End-to-end simu- 
lations are at least as important for 21 cm experiments 
because of the many complicated issues related to instru- 
mental performance,_ionospheric turbulence corrections, 
etc. [lj| [H, [27], [H, [2^] . In order to construct accurate 
simulations at the metre wavelengths, the angular corre- 
lation of radio sources must be taken into account [3fj| . 
It is important to point out that the relative importance 
of the clustering contribution increases and may eventu- 
ally become dominant if sources are identified and sub- 
tracted down to faint flux limits [28| - which are exactly 
the limits involved in the point source removal of 21 cm 
experiments. 
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TABLE I: 21 cm Tomography Experiments. 



Experiment 


FWHM 


V 


Receiver 


Sensitivity 


Effective Area 


Site-yr 






[MHz] 






[m 2 ] 




GMRT 


3.8°-0.4° 


50-1420 


30 dishes 




5.10 4 


India - 2007 


PAST/21CMA 


3' 


50-200 


10,000 antennas 


15 mK/^Jday 


7.10 4 


Ulastai,CH - 2007 


LOFAR 


25"-3.5" 


10-240 


25,000 dipole antennas 




1.10 5 


Drenthe,NL - 2007 


MWA 


15' 


80-300 


8,192 dipole antennas 




1.10 4 


Murchison,AU - 2007 


PAPER 




110-200 


16 antennas 




1.10 4 


USA/AU - 2008 


SKA 


0.1" 


100-25GHz 








AU(?) - 2015(?) 



GMRT = Giant Metrewave Radio Telescope, see http://www.gmrt.ncra.tifr.res.in/ 
PaST/21CMA = PrimevAl Structure Telescope, see http://web.phys.cmu.edu/~past/ 
LOFAR = LOw Frequency ARray, see http://www.lofar.org 
MWA = Murchison Widefield Array, see http://www.haystack.mit.edu/ast/arrays/mwa/index.html 
PAPER = Precision Array to Probe Epoch of Reionization, see http://astro.berkeley.edu/~dbacker/eor 
SKA = Square Kilometer Array, see http://www.skatelescope.org 



TABLE II: Publicly available point source catalogues at the frequencies relevant to 21-cm tomography. 



Ref 


V 


Region 


FWHM 






No!,., 


Observatory 


Status 




[MHz] 






[arcmin] 


[Jy] 


[Jy] 








[31j 


38 


00 h <a<24 h 


+60°<(5<+90° 


4.5 




l 


5859 


CLFST, ENG 


A 


[32] 


60 


00 h <Q<24 h 


+55°<<5<+55° 


450 




12 


100 


Pushchino, RUS 


B 


[58] 


74 


00 h <a<24 h 


-30°<<5<+90° 


1.33 


0.77 


0.1 


68311 


VLA, USA 


A 


[33] 


80 


00 h <a<24 h 


-49°<<5<+37° 


3.7 




2 


999 


Culgoora, ENG 


A 


[33] 


80 


00 h <Q<24 h 


-49°<<5<+37° 


3.7 




2 


1748 


Culgoora, ENG 


A 


[34] 


81 


00 h <a<24 h 


+70°<<5<+90° 


10 




1 


558 


Cambridge, ENG 


B 


[35] 


102 


00 h <a<24 h 


+27°<<5<+70° 


60 




3 


920 


LPA, RUS 


A 


[36] 


150 


18 h <Q<24 h 


-70°<<5<-10° 


4.6 


2 


0.96 


2784 


MRT, India 


A 


[37] 


151 


00 h <a<24 h 


+30°<<5<+90° 


4.2 




0.13 


34418 


CLFST, ENG 


A 


[38] 


151 


00 h <a<24 h 


+21°<(5<+90 o 


1.2 




0.120 


43689 


CLFST, ENG 


A 


[39] 


159 


00 h <Q<24 h 


-22°<<5<+71° 


10.0 




7 


471 


Cambridge, ENG 


A 


[33] 


160 


00 h <a<24 h 


-49°<<5<+37° 


1.85 




1.2 


2041 


Culgoora, ENG 


A 


[40] 


178 


00 h <a<24 h 


-90°<<5<-05° 


6.0 




5 


11000 


Cambridge, ENG 


A 


[41,42] 


178 


00 h <Q<24 h 


-07°<<5<+80° 


11.5 




2 


4844 


4C Array, ENG 


A 


[43] 


232 


00 h <a<24 h 


+30°<<5<+90° 


3.8 




0.1 


34426 


MSRT, CHI 


A 


[44] 


325 


00 h <a<24 h 


+30°<<5<+90° 


0.9 




0.1 


229420 


WSRT, NLD 


A 


[45] 


352 


00 h <Q<24 h 


-09°<<5<+26° 


0.9 




0.010 


84481 


WSRT, NLD 


A 


[46] 


365 


00 h <a<24 h 


+36°<(5<+72° 


0.1 




0.25 


66841 


UTRAO, USA 


A 



S com p = Limit of completeness. 
S m in = Smallest flux value. 
N bj = Number of sources in the catalogue. 
A = Publicly available in digital form. 
B = Available as printed table (which we will OCR). 
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TABLE III: Published w(6») 1 values. 



Ref 


V 


A 


7 


w(0) 






[GHz] 


xl0~ 3 




n 


[mJy] 


[471 


0.178 






1.50-3.0 


3000 


[48] 


0.325 


1.0 ± 0.4 


1.22 ±0.33 


> 0.2 


35 


[491 


0.408 








250 


[501 


0.408 








10 


[48] 


0.843 


2.0 ± 0.4 


1.24 ±0.16 


> 0.2 


10 


[511 


1.400 


2.6 ± 0.8 


1.2 ±0.1 


0.07-4.0 


3 


[52] 


1.400 


1.1 ± 0.1 


1.5 ±0.1 


> 0.07 


3 


[53] 


1.400 


1.0 ± 0.3 


0.9 ±0.2 


> 0.07 


3 


[53, 54] 


1.400 


1.0 ± 0.2 


0.7 ±0.1 


> 0.1 


10 


[48] 


1.400 


1.5 ± 0.2 


1.05 ±0.10 


> 0.3 


10 


[49] 


2.7 








350 


[55] 


4.850 






0.70-1.7 


45 


[56] 


4.850 




0.8 


0.30-1.9 


35 


[57] 


4.850 


10.0 ± 5.0 


0.8 


0.01-1.0 


50 



v(9) is fitted by a power-law of the form A8 
Sum — Smallest flux value. 



In this paper, we present measurements of the angu- 
lar 2-point correlation function, w(9), from the 74 MHz 
VLSS survey (58[. We obtain the first measurement of 
clustering at the low frequencies relevant to 21 cm to- 
mography. In Sectionfnl we described the statistical tools 
used in this analysis, as well as the 74 MHz VLSS survey. 
In Section iHl] we describe our results, and in Section LTV] 
we present our conclusions. 



II. DATA ANALYSIS TOOLS 



A. The Angular 2-point Correlation Function 
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FIG. 2: A comparison between point source sensitivity and 
resolution of the 74 MHz VLSS survey (in red) and other low 
frequency surveys (see Table IIT|| . 



the same boundaries, or 

DD{9) * RR(6) 



w(9) 



[DR(9W 



(2) 



[61j . where DD(9), RR{6) and DR{9) are the numbers of 
data-data, random-random and data-random pairs sepa- 
rated by the distance 8 + S9. It is important to remember 
that the estimation of RR(8) and DR{9) requires a cat- 
alogue of objects scattered uniformly over an area with 
the same angular boundaries of the data catalogue. 



In recent years, the analysis of the correlation-function 
has become the standard way of quantifying the clus- 
tering of different populations of astronomical sources. 
Specifically, the angular two-point correlation function 
w(6) gives the excess probability SP, in comparison to a 
random Poisson distribution, of finding two sources in a 
solid angle f5Sli and 6SI2 separated by the angle 9. SP is 
defined as 

r5P = V 2 fJOi(5Sl 2 [l + w(9)}, (1) 

where V is the mean number density of objects in the 
catalogue under consideration [59| . 

Many derivations for estimators oiw(9) can be found 
in the literature (see, e.g., [5!| [6(| [6LJ ) . One way to 
estimate this function is to compare the distribution of 
the objects in the real catalogue to the distribution of 
points in a random Poisson distributed catalogue with 



B. Mock Catalogues 

We used the "Sphere Point Picking Algorithm" [62j 
to generate random cartesian vectors equally distributed 
on the surface of a unit sphere (to avoid having vectors 
"bunched" around the poles, as it would happen if one 
chooses to plot the vectors in spherical coordinates in- 
stead). Accordingly, we calculate these vectors by doing 

x = \J\ — v? cos 9 (3) 
y = \J\ - u 2 sin 9 (4) 
z = u, (5) 

where u — cos0, with 9 £ [0, 2tt) and u G [—1, 1] (63[. In 
order to obtain points such that any small area on the 
sphere is expected to contain the same number of points, 
we choose u and v to be random variates in the interval 
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FIG. 3: Measured w{9) for different galactic cuts. All angu- 
lar correlations are calculated at the flux limit of S — 770 
mjy. The red lines are single power law fits to the data, 
where w(0)=A9 and the yellow shaded regions are w(9) 
calculated using solely mocks. 




-50 o 



bf] 

FIG. 4: The signal as a function of galactic latitude for a 
constant galactic longitude of fcl20°. Green and yellow 
shades enclose the regions |b| < 20° and |6| < 10°, respec- 
tively. Sources within these shaded regions are masked from 
our analysis as they may be galactic in origin. 

logue with 68,311 discrete sources [Hj]. The VLSS cata- 
logue was created by fitting elliptical Gaussians to all the 
sources that are detected at the 5 sigma level or higher 
[HI, and it is complete at the 770 mjy level The 74 
MHz catalogue is shown in Figure [TJ top, and a compar- 
ison of this survey with other low-frequency surveys can 
be seen in Table HT1 and Figure [2] 

III. RESULTS 



[0, 1]. Therefore, we calculate 9 and (f> from 

9 = 2ttu (6) 
(j) = cos- 1 (2w - 1). (7) 

Using the equations above we generate a position in the 
random catalogue. If this position is inside the bound- 
aries of the data catalogue, then a temperature of the 
data catalogue is associated with that random vector. 
This procedure is repeated until the random catalogue 
has the same number of "objects" as the data catalogue. 
This method, also known as "bootstrapping" , involves 
resampling the data with replacement and, at random, 
to construct a new data set which has population distri- 
bution identical to that of the original dataset. Figure [1] 
shows a realization of one of our mock catalogues. 

C. VLSS: The VLA Low-Frequency Survey 

The VLA Low- frequency Sky Survey (VLSS, formerly 
known as 4MASS) is a 74 MHz (or 4 meter wavelength) 
continuum survey carried out by the National Radio As- 
tronomy Observatory (NRAO) and the Naval Research 
Laboratory (NRL). The aim of the survey is to map an 
area of 3tt sr covering the entire sky north of —30° decli- 
nation at resolution 80" (FWHM) , with an average noise 
level of 0.1 Jy/beam. The principal data product is a 
set of 358 continuum images of (14° x 14°), and a cata- 



In Figure [H we present our measurement of w(9) for 
the flux limit of S = 770 mjy (black squares), which is 
the completeness limit of the VLSS catalogue. Distances 
between data and/or random sources are measured in 
bins of 0.09°, which is safely above the VLSS resolution 
limit of 0.02°. We also investigated if w{9) changes with 
bin size, and we found no indication that any change in 
bin size affects our results. 

As shown in Figured! there are sources in the VLSS 
catalogue that may be galactic in origin. In this figure, 
we plot the source fluxes at galactic longitude £=120°as 
a function of galactic latitude b. The green and yellow 
shades enclose the regions \b\ < 20°and \b\ < 10°, respec- 
tively. To reduce contamination from galactic sources, 
we discarded regions inside chosen Galactic cuts; we also 
discarded regions below 5 < -10°, due to the patch sky 
coverage of VLSS - see Figure [TJ top. 

From top-to-bottom, Figure [3] shows the measured 
w(9) for different galactic cuts. We detected no corre- 
lation for cuts smaller than 10° and, above this limit, 
there are no large variations in w(9). Since we want to 
maximize the number of sources used in our statistics, 
from here on, all final calculations are for a 10° galactic 
cut (i.e., for 39,118 sources). A galactic cut of 10° (or 
bigger) also excludes the "blank" regions in the VLSS 
survey - see Figure [TJ They are regions around, e.g., 
CasA and Cyg A. 

We construct 100 mock catalogues using the proce- 
dure described in III Bl with flux values above the sensi- 
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TABLE IV: w(6») 1 results. 



I&l 


A 


7 


w{6) 




x 2 








[°] 


[mJy] 




10° 


0.103±0.026 


-1.21±0.35 


0.2-0.6 


770 


0.62 


15° 


0.062±0.011 


-1.81±0.47 


0.2-0.6 


770 


0.73 


20° 


0.041±0.007 


-2.22±0.78 


0.2-0.6 


770 


0.57 


25° 


0.066±0.011 


-1.81±0.28 


0.2-0.5 


770 


0.58 


10° 


0.113±0.029 


-1.09±0.20 


0.2-0.6 


850 


0.86 


10° 


0.104±0.028 


-1.26±0.38 


0.2-0.6 


900 


0.63 



Sum — Smallest flux value. 
1 w(6) is fitted by a power-law of the form AO 



tivity limit of the data catalogue and a chosen galactic 
cut of 10° applied. By cross-correlating the data with 
the 100 mocks, we produce a set of normally distributed 
estimates of the correlation function. The mean and the 
standard deviation of this distribution are used as a value 
for the estimate and its uncertainty in the measurement 
of w(9) at each 9 1 . The estimate (mean) and its un- 
certainty (the standard deviation) are shown in Figure [3] 
as the black squares and their error bars. Similarly, we 
correlated the 100 mocks with themselves. This result 
correspond to the yellow shaded region shown in Fig- 
ure[3]and, as expected in a Poissonian distribution, w(9) 
is consistent with zero. 

We find that a single power law with shape w(9) = 
A8 ' [5^|, where A is a measure of the amplitude of the 
average enhancement of the number of radio sources at a 
particular point in the sky, fits the data well. We present 
our measurements in Table IIV1 We also calculate w(8) 
for various flux-density limits at 770 mJy, 850 mJy and 
900 mJy. As shown in Table ITVl and Figure [SJ the ampli- 
tude of clustering does not depend on flux density. This 
same result was observed in previous angular correlation 
analysis (e.g., [48[). 

Some other interesting results can be taken from Fig- 
ure [31 (1) at large angular separations, 9 » 2°, w(6) is 
consistent with zero - this is a strong evidence for a high 
degree of uniformity in the survey. (2) at the small an- 
gular separations, 9 < 0.2°, there is a fall-off (or a break) 
in the value of w(8) - this effect is due to the failure of 
the survey to resolve weak double sources with separa- 
tions slightly greater than the beamwidth. [6(| presents 
a detailed explanation of this effect and show, in details, 
how this calculation is done. (3) Finally, at the angular 
separation of 9 wl.45°, there is an unexplained increase 
in the value of w(9). It is well studied and reported in 



1 Data points in a plot of w(9) are not independent, i.e., single 
sources can contribute pairs in more than one bin. Therefore, 
standard Poisson error bars will underestimate the true error in 
each bin. 
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FIG. 5: Measured amplitudes of A and 7 for various flux- 
density limits at 770 mJy, 850 mJy and 900 mJy. Note that 
the amplitude of clustering does not depend on flux density. 

the literature that instrumental effects in radio surveys 
manifest themselves on particular characteristic scales, 
and are usually rendered transparent by the w(9) anal- 
ysis (see, e.g.. [66]). If the anomaly described above is 
caused by such effects, this is something that should be 
carefully studied, but it is outside the scope of this paper. 

IV. DISCUSSION 

In order to construct accurate simulations at the me- 
tre wavelengths, the angular correlation of radio sources 
must be taken into account. The relative importance of 
the clustering contribution increases and may eventually 
become dominant if sources are identified and subtracted 
down to faint flux limits - which are exactly the limits in- 
volved in the point source removal of 21 cm experiments. 

Using the 74 MHz VLSS survey, we measured the an- 
gular 2-point correlation function, w(9). We obtain the 
first measurement of clustering at the low frequencies rel- 
evant to 21 cm tomography. We find that a single power 
law with shape w(9) = A8~ 7 fits the data well. For a 
galactic cut of \b\ > 10°, with a data cut of 8 > -10°, 
and a flux limit of S = 770 mJy, we obtain a slope of 7 = 
(-1.2± 0.35) with % 2 =0.62. This value of 7 is consistent 
with that measured from other radio catalogues - see Ta- 
bic [TTTJ The amplitude of clustering has a length of 0.2°- 
0.6°, and it is independent of the flux-density threshold. 
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